Supplemental Material for “ A Practical Algorithm for Topic Modeling with Provable Guarantees ”
نویسندگان
چکیده
(a) (b) (c) (d) Figure 1. Illustration of the Algorithm Recall that the correctness of the algorithm depends on the following Lemma: Lemma 1.1. The point d j found by the algorithm must be δ = O(/γ 2) close to some vertex v i. In particular , the corresponding a j O(/γ 2)-covers v i. In order to prove this Lemma, we first show that even if previously found vertices are only δ close to some vertices, there is still another vertex that is far from the span of previously found vertices. Lemma 1.2. Suppose all previously found vertices are O(/γ 2) close to distinct vertices, there is a vertex v i whose distance from span(S) is at least γ/2. In order to prove Lemma 1.2, we use a volume argument. First we show that the volume of a robust simplex cannot change by too much when the vertices are perturbed. } are the vertices of a γ-robust simplex S. Let S be a simplex with ver-tices {v √ Kδ < γ the volume of the two simplices satisfy vol(S)(1 − 2δ/γ) K−1 ≤ vol(S) ≤ vol(S)(1 + 4δ/γ) K−1. Proof: As the volume of a simplex is proportional to the determinant of a matrix whose columns are the edges of the simplex, we first show the following perturbation bound for determinant.
منابع مشابه
A Practical Algorithm for Topic Modeling with Provable Guarantees
Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora. Most approaches to topic model learning have been based on a maximum likelihood objective. Efficient algorithms exist that attempt to approximate this objective, but they have no provable guarantees. Recently, algorithms have been introduced that provide provable bounds, but th...
متن کاملFrom Correlation to Hierarchy: Practical Topic Modeling via Spectral Inference
Topic models were originally applied in text analysis for extracting high-level themes from documents, but they work equally well in any setting where users select items from an inventory. Recent work in spectral topic modeling has provided algorithms that operate only on easily-collected summary statistics, rather than exhaustively iterating over the full dataset. The “anchor word” algorithms ...
متن کاملA Topic Modeling Approach to Rank Aggregation
We propose a new model for rank aggregation from pairwise comparisons that captures both ranking heterogeneity across users and ranking inconsistency for each user. We establish a formal statistical equivalence between the new model and topic models. We leverage recent advances in the topic modeling literature to develop an algorithm that can learn shared latent rankings with provable statistic...
متن کاملA Topic Modeling Approach to Ranking
We propose a topic modeling approach to the prediction of preferences in pairwise comparisons. We develop a new generative model for pairwise comparisons that accounts for multiple shared latent rankings that are prevalent in a population of users. This new model also captures inconsistent user behavior in a natural way. We show how the estimation of latent rankings in the new generative model ...
متن کاملNecessary and Sufficient Conditions for Novel Word Detection in Separable Topic Models
The simplicial condition and other stronger conditions that imply it have recently played a central role in developing polynomial time algorithms with provable asymptotic consistency and sample complexity guarantees for topic estimation in separable topic models . Of these algorithms, those that rely solely on the simplicial condition are impractical while the practical ones need stronger condi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013